The Haskell type Char
represents Unicode code points. I was struggling with the output of certain Unicode characters by Haskell code on Windows Terminal, under Windows 11. Specifically:
1 2 3 4 |
module Main where main :: IO () main = putStrLn "≣≣≣≣" -- U+2263 (Strictly Equivalent To) |
either did not work at all or, when the OEM code page was changed in PowerShell Core 7.2.1 from the default 437 to 65001 (UTF-8) (using command chcp 65001
), output ΓëúΓëúΓëúΓëú
(in hex CE 93 C3 AB C3 BA, repeated).
The UTF-8 encoding of U+2263 is E2 89 A3
. In the 437 code page, that sequence of bytes corresponds to the characters (in Unicode) U+393, U+EB and U+FA. The UTF-8 encoding of U+393, U+EB and U+FA is CE 93 C3 AB C3 BA. So, what was being output was, apparently, UnicodeToUFT-8( CP437ToUnicode( UncodeToUTF-8( ‘≣’ ) ) ).
Solution
The solution was, first, to choose Change system locale...
from the Administrative
tab of the Region
dialog, accessed from Administative language settings
under Time & language > Language & region
.
Second, to check the Beta: Use Unicode UTF-8 for worldwide language support
on the resulting Region Settings
dialog.
This is said to set the values of keys ACP
, MACCP
, and OEMCP
in the Registry at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage
to 65001
(type REG_SZ
). The keys are said to indicate ACP
– the default ‘ANSI’ code page, MACCP
– the default Macintosh code page, and OEMCP
the default OEM code page. Console windows use the OEM code page. Legacy, non-Unicode GUI-subsystem applications use the ‘ANSI’ code page.